Exploratory Analysis: Clustering of MFPCA scores


Clustering

In order to group the curves into clusters, the first 10 PC scores are clustered by a k-means algorithm. In order to evaluate an appropriate number of clusters, Figure 1 shows an elbow plot for \(k \in \{1,...,10\}\):

Figure 1: Elbow plot for k-means clustering for k \in \{1,...,10\}.
Figure 1: Elbow plot for k-means clustering for \(k \in \{1,...,10\}\).

The elbow plot would indicate an elbow at \(k=2\). However, given the data structure with four scenarios, in the following, \(k=4\) is chosen. The four clusters are rather unbalanced and dominated by two large clusters, namely clusters 3 and 4, as the following table indicates:

Number of curves in each cluster after 126 years of recovery
Control SSP1-RCP2.6 SSP3-RCP7.0 SSP5-RCP8.5 Sum
Cluster 1 47 177 204 224 652
Cluster 2 83 50 55 49 237
Cluster 3 68 46 58 65 237
Cluster 4 236 169 145 127 677
Sum 434 442 462 465 1803

In order to get a first impression of the cluster-specific PFT-behavior, Figure 2 shows the PFT-wise mean shares of above ground carbon over the first 100 years of recovery. Note that here, the clusters are derived using the whole time span of 126 years of recovery and all scenarios are combined.


Figure 2: PFT-wise mean shares of above ground carbon over time. Note that the values are averages over locations and clusters.
Figure 2: PFT-wise mean shares of above ground carbon over time. Note that the values are averages over locations and clusters.

Figure 3 shows the spatial distribution of the clusters:


Figure 3: Spatial distribution of the clusters for each scenario.
Figure 3: Spatial distribution of the clusters for each scenario.

Cluster development over time

Theory in science of vegetation states that the vegetation composition after disturbance is already set only shortly after the disturbance. This would mean in statistical terms, that the cluster assignment should be rather similar no matter how many years of recovery after disturbance are considered. Figure 4 shows the cluster assignments for all four scenarios over different years after disturbance:


Figure 4: Cluster composition over years after disturbance for all four scenarios.
Figure 4: Cluster composition over years after disturbance for all four scenarios.

We can clearly see that the final composition of clusters is already (nearly) reached after about 60 years depending on the scenario. Drastic changes are especially in the first 20 years and the clusters quickly stabilize in their core grid cells.

Figure 5 shows the corresponding Adjusted Rand Index (ARI) values, an indicator, how much clusters coincide:


Figure 5: ARI of cluster composition over years after disturbance for all four scenarios.
Figure 5: ARI of cluster composition over years after disturbance for all four scenarios.

In order to further investigate these patterns, Figure 6 shows the temporal clustering using the respective year only, i.e. not the whole trajectory until the respective year, but the PC scores derived for that year only.

Figure 6: Cluster composition for individual years after disturbance for all four scenarios.
Figure 6: Cluster composition for individual years after disturbance for all four scenarios.

Figure 7 shows again the corresponding ARI values, here for the separate scenarios and all of them combined:


Figure 7: ARI of cluster composition for individual years after disturbance for all four scenarios.
Figure 7: ARI of cluster composition for individual years after disturbance for all four scenarios.

Soil properties

Figure 8 shows some soil properties per cluster and scenario. Note that the data is missing for scenario SSP3-RCP7.0.


Figure 8: Soil properties for each cluster and three scenarios.
Figure 8: Soil properties for each cluster and three scenarios.